Learning Structured Predictors from Bandit Feedback for Interactive NLP
نویسندگان
چکیده
Structured prediction from bandit feedback describes a learning scenario where instead of having access to a gold standard structure, a learner only receives partial feedback in form of the loss value of a predicted structure. We present new learning objectives and algorithms for this interactive scenario, focusing on convergence speed and ease of elicitability of feedback. We present supervised-to-bandit simulation experiments for several NLP tasks (machine translation, sequence labeling, text classification), showing that bandit learning from relative preferences eases feedback strength and yields improved empirical convergence.
منابع مشابه
Bandit Structured Prediction for Learning from Partial Feedback in Statistical Machine Translation
We present an approach to structured prediction from bandit feedback, called Bandit Structured Prediction, where only the value of a task loss function at a single predicted point, instead of a correct structure, is observed in learning. We present an application to discriminative reranking in Statistical Machine Translation (SMT) where the learning algorithm only has access to a 1 − BLEU loss ...
متن کاملStructured Prediction via Learning to Search under Bandit Feedback
We present an algorithm for structured prediction under online bandit feedback. The learner repeatedly predicts a sequence of actions, generating a structured output. It then observes feedback for that output and no others. We consider two cases: a pure bandit setting in which it only observes a loss, and more fine-grained feedback in which it observes a loss for every action. We find that the ...
متن کاملBandit Structured Prediction for Neural Sequence-to-Sequence Learning
Bandit structured prediction describes a stochastic optimization framework where learning is performed from partial feedback. This feedback is received in the form of a task loss evaluation to a predicted output structure, without having access to gold standard structures. We advance this framework by lifting linear bandit learning to neural sequence-to-sequence learning problems using attentio...
متن کاملAn Interactive Tool for Natural Language Processing on Clinical Text
Natural Language Processing (NLP) systems often make use of machine learning techniques that are unfamiliar to endusers who are interested in analyzing clinical records. Although NLP has been widely used in extracting information from clinical text, current systems generally do not support model revision based on feedback from domain experts. We present a prototype tool that allows end users to...
متن کاملCounterfactual Risk Minimization
We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. Unlike in supervised learning, where the algorithm receives training examples (xi, y ∗ i ) with annotated correct labels y ∗ i , bandit feedback merely provides a cardinal reward δi ∈ R for the prediction yi that the logging system made for context xi. Such bandit feedback is ubiquitous in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016